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Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

Claims 1-3 and 13-15 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Basu et al. (2003/0018475) and in view of Girod (6,483,532). 

As to claim 1, Basu, teaches, in "audio-visual speech detection and recognition 
system", a noise reduction system including an audio-visual user interface for combining 
visual features extracted from a digital video sequence with audio features extracted 
from an analog audio sequence including background noise, the system comprising: 

speech sequence detection means for detecting audio signals (Par.0012-0013); 

speech feature extraction and analyzing means; (Par.0038, 0042) 

video sequence detection means for detecting said video sequence (Par.0010); 

visual feature extraction and analysis means for analyzing the detected video 
sequence and extracting said visual features therefrom (Par.0081); and 

a means to prevent background noise from being processed by the system 
based on the derived speech characteristics and to out put speech activity indication 
signal based on the combination of the speech detection and video sequence detection 
means (Par.0094-0097; abstract; Figs.1, 8-10; Par.0088). 

It is noted that Basu doesn't explicitly teach where the system comprises echo 
cancellation means. 
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Girod, however, teaches, in a video assisted audio signal processing system, 
noise reduction system (Fig.3) for modifying a speech signal, including an audio-visual 
user interface for combining visual features extracted from a digital video sequence with 
audio sequence, said system comprising: 

audio signal processing means (324) for processing audio signal; 

video sequence detection means (354) for detecting said video sequence; 

visual feature extraction and analysis means (360) for analyzing the detected 
video sequence and extracting said visual features therefrom; and 

a multi-channel acoustic echo cancellation unit ( 312) configured to perform a 
near-end speaker detection (314) and double-talk detection (318) algorithm based on 
the audio analysis means and the visual detection means and to modify the near end 
speech by cancelling echo (noise) in the speech signal (abstract; Figs. 1-4; Col.1, line 
50-Col.2, line 38). 
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It would have been obvious to one of ordinary skill in the art at the time of 
applicant's invention to modify Basu teaching as claimed, in view of Girod, for the 
purpose of reliably distinguishing between speech that are meant to be processed by 
the system from unintended background speech including acoustic echo thereby 
avoiding false activation of the system. 

As to claim 2, Basu teaches, enabling/disabling the microphone based on 
whether or not the speech energy level detected is below/above a 'given signal level' 
(threshold) (Par.0097, 0094, 0096). 

As to claim 3, Basu teaches where the audio feature extraction and analysis 
means comprises an amplitude detector (Par.0039). 

As to claims 13-15, Basu teaches the corresponding system for reducing noise in 
speech using audio features plus visual speech feature vectors as addressed above for 
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claim 1 in detail and Girod teaches where the system disclosed is used in a video 
communication/telephony application including microphone, video camera and speaker 
(Figs. 103; Claim 8), and the motivation for using the Basu system in video-telephony 
application would be obvious to one skill in the art for the purpose of reliably detecting 
background noise in the communication signal. 

Claims 4, 7 and 8 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Basu et al. (2003/0018475) and in view of Wynn (5,706,394) 

As to claim 4, Basu, teaches a method for reducing noise comprising the steps 

of: 

Converting analog speech to digital; 

acoustic feature extraction process by Fourier transforming the magnitudes of 
discrete of samples of speech data; (Par.0038-0039, 0042); and 

detecting speech in an audio signals by analyzing visual features extracted from 
video sequence associated with the audio sequence including current position of face, 
lip or facial expression of the speaker; and 

preventing background noise from being processed by the system based on the 
derived speech characteristics and to out put speech activity indication signal on the 
combination of the audio processing and video sequence detection means (Par.0094- 
0097; abstract; Figs. 1, 8-10; Par.0088). 

Basu doesn't explicitly teach the claimed process of subtracting noise from the 
speech signal. 

Wynn teaches a method for reducing noise in speech, comprising: 
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Estimating a noise power density spectrum of background noise based on a 
voice activity detector; 

Subtracting the estimated power noise from the speech signal; 

Inverse transforming the signal into time domain where the noise subtracted 
speech signal could be input to speech recognizer (abstract; Col.1, lines 31-35; Col. 8, 
line 65-Col.9, line 11; Col. 16, lines14-20). It would have been obvious to one of ordinary 
skill in the art at the times of applicant to modify Basu system in view of Wynn for the 
purpose of efficiently removing background noise from the speech signal. 

As to claim 7, Basu teaches wherein said visual speech characteristics are based 
on detecting, face, opening of a mouth of the speaker, detecting the lips of the speaker 
or detecting other phonetic characteristics associated with position and movement of 
the lips (Par.0043-0046, Figs.2-4). 

As to claim 8, Basu teaches detecting the voice of the speaker by analyzing 
visual features extracted from video sequences associated with the speech where the 
visual features include mouth movement, face, the lips of the speaker or detecting other 
phonetic characteristics associated with position and movement of the lips (Par.0043- 
0046, Figs.2-4). 

Claims 5, 6, 9 and 10-12 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Basu et al. (2003/0018475) in view of Wynn (5,706,394) and 
further Girod (6,483,532). 

As to claim 5, Basu teaches where acoustic-phonetic (visual speech feature 
characters) are derived by an algorithm for extracting the visual feature from video 
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sequence associated with audio sequence including movement and position of lip of 
facial expression in an image signal (Par.0081). the step of acoustic echo cancellation 
as claimed is not taught by Basu, however Girod as addressed above for claim 1 , 
teaches a near end acoustic echo signal detection cancelling process by utilizing the 
combination of video detection means and audio processing means, the motivation for 
combining the two teachings is same as provided in claim 1 . 

As to claim 6, Girod teaches where the acoustic echo cancellation process 
includes a double talk detection procedure (Fig.3). 

As to claim 9, Wynn teaches where the noise suppressing method comprises 
comparing the spectrum of, inherently delayed, audio input with a voice activity estimate 
(threshold, TH) obtained by amplitude detection of a filtered discrete signal spectrum to 
provide an estimate for a frequency spectrum corresponding to a signal which 
represents a voice of said speaker as well as an estimate for the noise power density 
spectrum of the statistically distributed background noise (Fig. 13; Col. 14, lines14-30; 
Col. 15, lines 2-20). 

As to claims 10 and 12, Basu teaches a speech present estimation means and 
an event detection means where the event detection means comprises the audio 
feature vectors, A, extracted from audio signal and visual speech feature vectors, V, 
extracted from visual sequences and which are representative visual-speech and the 
detection is made on the combinations of the two sets of feature vectors, i.e, the audio 
plus the visual-speech features (Par.0042, 0080). 
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As to claim 12, Basu teaches where speech activity estimate features and visual- 
speech activity estimate features are combined/added to form a single audio visual- 
speech feature vector and correlated to audio visual-speech probabilities to make the 
detection decision (Par.103-104, 107) 

Basu also teaches detecting speech using energy threshold as discussed above. 
Basu however, doesn't explicitly teach where speech/noise estimate is updated as 
claimed. Wynn teaches where the speech activity threshold is updated for every frame 
according to spectrally estimated noise in the speech signal (Fig. 13) and this process 
would have been obvious in Basu system for the purpose of adjusting the energy 
threshold in accordance to the level of the present background noise as well as for 
effectively cancelling background noise in the speech signal. 

As to claim 1 1 , adjusting the frequency band of the filtered signal is inherent in 
Wynn teaching (Col. 15, lines 2-20). 

Information Disclosure Statement 

The information disclosure statement filed 7/20/2005 fails to comply with 37 CFR 
1 .98(a)(2), which requires a legible copy of each cited foreign patent document; each 
non-patent literature publication or that portion which caused it to be listed; and all other 
information or that portion which caused it to be listed. It has been placed in the 
application file, but the information referred to therein has not been considered. 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Daniel D. Abebe whose telephone number is 571-272- 
7615. The examiner can normally be reached on monday-friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/Daniel D Abebe/ 

Primary Examiner, Art Unit 2626 



